Polycarpa Aurata

Jacob Mora



My project this semester aimed to analyze the microbial community of the chordate Polycarpa Aurata in order to create a phylogenetic tree of the bacteria present in this organism. Illumina sequencing was used to gather the genetic information of the samples, and DADA2 and phyloseq were used to analyze the reads.

My Data

There were 176 samples collected in 11 different locations with the goal of learning more about their microbial communities. These organisms are native to the tropical eastern Indian Ocean and the western Pacific Ocean. This particular batch of samples was collected from the Philippines/Indonesia area. Each dot on the map represents a different location where samples of the Polycarpa Aurata were collected.

Workflow

To clean the data, I followed the workflow given by Dr. Zahn in his paper Marker Genes (16S and ITS) Protocol for Plant Microbiome Analyses.


Remove Primers

The first step in the process is removing the primers off the sequences. These are artificially added and need to be removed to properly analyze the sequences. In these samples, the forward primer to be removed was “GTGCCAGCMGCCGCGGTAA” and the reverse primer was “GGACTACHVGGGTWTCTAAT.” Once removed, I could move on to quality filtration.

The green line is the mean, the orange is the median, and the red is the scaled proportion of reads that made it to that point. The dashed lines are 25th and 75th quantiles.

Quality Filtration

As you can see from the quality profiles, the ends of reads aren’t of the best quality, so they must be trimmed. The forward reads of DNA are usually of better quality than those of reverse reads, so this must also be taken into account when trimming ends. With these samples, I decided to trim the forward reads at 250 and the reverse reads at 160 to ensure that the quality score was above 20 (99.9% accuracy).

ASV Inference

I then moved onto the ASV inference part of the workflow. ASV’s are amplicon sequence variants, which are sequences of DNA that are known to belong to a certain organism. Before you can infer ASV sequences, an error model must be made to correct the mistakes made during sequencing.

This next figure shows the error model that was made.

As with any model, you want to see the estimated and observed values be similar. You also want to see a decline in error as quality score increases. With this error model, I was able to “denoise” the sequences. Denoising is a process by which you learn, account for, and correct the errors made during sequencing. This is done using the learnErrors() function in DADA2 which is a machine learning function.

Remove Chimeras
The next step in the process is the removal of chimeras. Chimeric sequences are DNA sequences that have been incorrectly matched with each other during PCR because of the use of mixed templates. They need to be removed to further refine the data before analysis.

After the removal of these sequences I began to remove anything that wasn’t bacterial. That means all mitochondrial, fungal, eukaryotic, and negative control sequences.
Build Phylogeny

Once this was done I moved onto the building of the phylogenetic trees. I assigned taxonomy using the Silva version 138.1 Data Base and formatted the trees using ggtree.

Class Tree

The end product is the phyolegentic tree of the bacteria present in the Polycarpa Aurata. This tree is showing us the phylogeny down to class.


The longer the lines are the more abundant that particular phylum was in the samples.

Family Tree

This next tree shows phylogeny down to family of the bacteria present in the samples.

Next Steps

We want to find out if the microbial diversity of the Polycarpa Aurata changed with location. The meta data we have includes GPS coordinates and the goal is to find out if there are any real differences in diversity at these different locations.